AITopics | diag 1

Collaborating Authors

diag 1

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Low-Rank Optimal Transport through Factor Relaxation with Latent Coupling Peter Halmos

Neural Information Processing SystemsFeb-18-2026, 05:40:46 GMT

We leverage these advantages to derive a new algorithm Factor Relaxation with Latent Coupling (FRLC), which uses coordinate mirror descent to compute the LC factorization.

artificial intelligence, machine learning, optimization problem, (20 more...)

Neural Information Processing Systems

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry:

Health & Medicine > Therapeutic Area (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Health & Medicine > Diagnostic Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Vision (0.67)

Add feedback

Supplementarymaterial: NeuralAnisotropy Directions AnonymousAuthor(s) Affiliation Address email

Neural Information Processing SystemsFeb-10-2026, 11:32:06 GMT

Note that, because the number of basis vectors parameterized by62 the imaginary coefficients is smaller,there are four gaps in Fig. S2. The results ofthis experiment are illustrated inFig. Because the eigendecomposition ofID is isotropic, we can see that the logistic regression has no159 directional bias.160 8 Example3(Single hidden-layer neural network). Surprisingly, both algorithms yield very similar results, but the algorithm based on the191 eigendecomposition ofthegradient covariance isnumerically much more stable. Meanwhile, thegradient covariance onlyrequires information about firstorder gradients195 and these are orders of magnitudes larger than the second derivatives.

artificial intelligence, diag, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.86)

Add feedback

Closing the Curvature Gap: Full Transformer Hessians and Their Implications for Scaling Laws

Petrov, Egor, Kiselev, Nikita, Meshkov, Vladislav, Grabovoy, Andrey

arXiv.org Artificial IntelligenceOct-21-2025

The lack of theoretical results for Layer Normalization and feedforward Hessians has left a gap in the study of Transformer optimization landscapes. We address this by deriving explicit second-order expressions for these components, thereby completing the Hessian characterization of full Transformer blocks. Our results generalize prior self-attention analyses and yield estimations for the role of each sublayer in curvature propagation. We demonstrate how these Hessian structures inform both convergence dynamics and the empirical scaling laws governing large-model performance. Further, we propose a Taylor-expansion-based framework for analyzing loss differences to quantify convergence trajectories. By extending Hessian theory to the full Transformer architecture, this work establishes a new foundation for theoretical and empirical investigations of optimization in large-scale deep learning.

layernorm, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.16927

Country:

North America > United States (0.14)
North America > Dominican Republic (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

cfc1924c62e72e2cb0e0feeecb963241-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 17:08:44 GMT

algorithm, dataset, diag, (17 more...)

Neural Information Processing Systems

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry:

Health & Medicine > Therapeutic Area (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Health & Medicine > Diagnostic Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Vision (0.67)

Add feedback

5adff4d5402703418f7210a4004e1314-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-15-2025, 01:56:43 GMT

algorithm 3, diag 1, main manuscript, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

3569df159ec477451530c4455b2a9e86-Supplemental.pdf

Neural Information Processing SystemsAug-14-2025, 03:43:35 GMT

artificial intelligence, gradient, machine learning, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

Communication-Aware Map Compression for Online Path-Planning: A Rate-Distortion Approach

Pedram, Ali Reza, Psomiadis, Evangelos, Maity, Dipankar, Tsiotras, Panagiotis

arXiv.org Artificial IntelligenceJun-26-2025

--This paper addresses the problem of collaborative navigation in an unknown environment, where two robots, referred to in the sequel as the Seeker and the Supporter, traverse the space simultaneously. The Supporter assists the Seeker by transmitting a compressed representation of its local map under bandwidth constraints to support the Seeker's path-planning task. We introduce a bit-rate metric based on the expected binary codeword length to quantify communication cost. Using this metric, we formulate the compression design problem as a rate-distortion optimization problem that determines when to communicate, which regions of the map should be included in the compressed representation, and at what resolution (i.e., quantization level) they should be encoded. Our formulation allows different map regions to be encoded at varying quantization levels based on their relevance to the Seeker's path-planning task. We demonstrate that the resulting optimization problem is convex, and admits a closed-form solution known in the information theory literature as reverse water-filling, enabling efficient, low-computation, and real-time implementation. Additionally, we show that the Seeker can infer the compression decisions of the Supporter independently, requiring only the encoded map content and not the encoding policy itself to be transmitted, thereby reducing communication overhead. Simulation results indicate that our method effectively constructs compressed, task-relevant map representations, both in content and resolution, that guide the Seeker's planning decisions even under tight bandwidth limitations. UTONOMOUS navigation in unknown environments is essential for many real-world robotic applications, including search and rescue missions [1], agricultural surveys, and planetary exploration [2].

artificial intelligence, optimization problem, planning & scheduling, (19 more...)

arXiv.org Artificial Intelligence

2506.20579

Country:

Oceania > Australia > New South Wales > Callaghan (0.04)
North America > United States > North Carolina (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Preconditioned Langevin Dynamics with Score-Based Generative Models for Infinite-Dimensional Linear Bayesian Inverse Problems

Baldassari, Lorenzo, Garnier, Josselin, Solna, Knut, de Hoop, Maarten V.

arXiv.org Machine LearningMay-28-2025

Designing algorithms for solving high-dimensional Bayesian inverse problems directly in infinite-dimensional function spaces - where such problems are naturally formulated - is crucial to ensure stability and convergence as the discretization of the underlying problem is refined. In this paper, we contribute to this line of work by analyzing a widely used sampler for linear inverse problems: Langevin dynamics driven by score-based generative models (SGMs) acting as priors, formulated directly in function space. Building on the theoretical framework for SGMs in Hilbert spaces, we give a rigorous definition of this sampler in the infinite-dimensional setting and derive, for the first time, error estimates that explicitly depend on the approximation error of the score. As a consequence, we obtain sufficient conditions for global convergence in Kullback-Leibler divergence on the underlying function space. Preventing numerical instabilities requires preconditioning of the Langevin algorithm and we prove the existence and the form of an optimal preconditioner. The preconditioner depends on both the score error and the forward operator and guarantees a uniform convergence rate across all posterior modes. Our analysis applies to both Gaussian and a general class of non-Gaussian priors. Finally, we present examples that illustrate and validate our theoretical findings.

diag 1, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2505.18276

Country:

North America > United States > California > Orange County > Irvine (0.04)
Europe > United Kingdom (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.61)

Add feedback

Low-Rank Optimal Transport through Factor Relaxation with Latent Coupling

Halmos, Peter, Liu, Xinhao, Gold, Julian, Raphael, Benjamin J

arXiv.org Machine LearningNov-15-2024

Optimal transport (OT) is a general framework for finding a minimum-cost transport plan, or coupling, between probability distributions, and has many applications in machine learning. A key challenge in applying OT to massive datasets is the quadratic scaling of the coupling matrix with the size of the dataset. [Forrow et al. 2019] introduced a factored coupling for the k-Wasserstein barycenter problem, which [Scetbon et al. 2021] adapted to solve the primal low-rank OT problem. We derive an alternative parameterization of the low-rank problem based on the $\textit{latent coupling}$ (LC) factorization previously introduced by [Lin et al. 2021] generalizing [Forrow et al. 2019]. The LC factorization has multiple advantages for low-rank OT including decoupling the problem into three OT problems and greater flexibility and interpretability. We leverage these advantages to derive a new algorithm $\textit{Factor Relaxation with Latent Coupling}$ (FRLC), which uses $\textit{coordinate}$ mirror descent to compute the LC factorization. FRLC handles multiple OT objectives (Wasserstein, Gromov-Wasserstein, Fused Gromov-Wasserstein), and marginal constraints (balanced, unbalanced, and semi-relaxed) with linear space complexity. We provide theoretical results on FRLC, and demonstrate superior performance on diverse applications -- including graph clustering and spatial transcriptomics -- while demonstrating its interpretability.

artificial intelligence, diag, machine learning, (19 more...)

arXiv.org Machine Learning

2411.10555

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)

Genre: Research Report (0.63)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Health & Medicine > Therapeutic Area (0.45)
Health & Medicine > Diagnostic Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Decentralized Federated Learning with Gradient Tracking over Time-Varying Directed Networks

Nguyen, Duong Thuy Anh, Wang, Su, Nguyen, Duong Tung, Nedich, Angelia, Poor, H. Vincent

arXiv.org Artificial IntelligenceSep-25-2024

We investigate the problem of agent-to-agent interaction in decentralized (federated) learning over time-varying directed graphs, and, in doing so, propose a consensus-based algorithm called DSGTm-TV. The proposed algorithm incorporates gradient tracking and heavy-ball momentum to distributively optimize a global objective function, while preserving local data privacy. Under DSGTm-TV, agents will update local model parameters and gradient estimates using information exchange with neighboring agents enabled through row- and column-stochastic mixing matrices, which we show guarantee both consensus and optimality. Our analysis establishes that DSGTm-TV exhibits linear convergence to the exact global optimum when exact gradient information is available, and converges in expectation to a neighborhood of the global optimum when employing stochastic gradients. Moreover, in contrast to existing methods, DSGTm-TV preserves convergence for networks with uncoordinated stepsizes and momentum parameters, for which we provide explicit bounds. These results enable agents to operate in a fully decentralized manner, independently optimizing their local hyper-parameters. We demonstrate the efficacy of our approach via comparisons with state-of-the-art baselines on real-world image classification and natural language processing tasks.

algorithm, gradient, optimization, (16 more...)

arXiv.org Artificial Intelligence

2409.17189

Country: